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Abstract 


The displacement distribution of a water molecular is characterized mathematically as Gaussianity without con¬ 
sidering potential diffusion barriers and compartments. However, this is not true in real scenario: most biological 
tissues are comprised of cell membranes, various intracellular and extracellular spaces, and of other compartments, 
where the water diffusion is referred to have a non-Gaussian distribution. Diffusion kurtosis imaging (DKI), recently 
considered to be one sensitive biomarker, is an extension of diffusion tensor imaging, which quantifies the degree of 
non-Gaussianity of the diffusion. This work proposes an efficient scheme of maximum likelihood estimation (MLE) 
in DKI: we start from the Rician noise model of the signal intensities. By augmenting a Von-Mises distributed latent 
phase variable, the Rician likelihood is transformed to a tractable joint density without loss of generality. A fast 
computational method, an expectation-maximization (EM) algorithm for MLE is proposed in DKI. To guarantee the 
physical relevance of the diffusion kurtosis we apply the ternary quartic (TQ) parametrization to utilize its positivity, 
which imposes the upper bound to the kurtosis. A Fisher-scoring method is used for achieving fast convergence of 
the individual diffusion compartments. In addition, we use the barrier method to constrain the lower bound to the 
kurtosis. The proposed estimation scheme is conducted on both synthetic and real data with an objective of healthy 
human brain. We compared the method with the other popular ones with promising performance shown in the results. 

Keywords Barrier method, constrained Fisher scoring, data augmentation, constraint, Cholesky, DKI, MLE, non- 
Gaussian, positivity, Rician, TQ, Von Mises. 

1 Introduction 

Magnetic resonance (MR) is capable of measuring the displacement diffusion of water molecules and provides a 
unique insight into image contrasts reflecting anatomical architectures inside organic tissues. Diffusion tensor imaging 
(DTI) is one of the noninvasive imaging modalities based on the diffusion weighted (DW-) MR measurements. It 
captures the neurostructural information by means of diffusion tensors, where the probability of the water diffusion is 
simply assumed to be Gaussian. However, this assumption is argued to diverge significantly from the genuine in many 
biological tissues, especially in human brain containing an appendage of complex microstructural-rich tissues, i.e. cell 
membranes, boundaries and other complex compartments, where the displacement distribution is no longer Gaussian. 

Diffusion kurtosis imaging (DKI) is recently referred as a natural extension of DTI (8] QTj [24] and as one of high 
angular resolution diffusion imaging (HARDI, III] El) techniques. It attempts to quantify the degree of diffusional 
deviation from the Gaussian density expressed by lfl4l l24l [20l BT1 [22] 
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where b is the diffusion weighting amplitudes or so-called b value, D app := g T Dg = L is called the 

n,^2=i 

apparent diffusional coefficient, and K app is the apparent diffusion kurtosis with the further derivation 
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where "tr" denotes the trace of the matrix operator, and tr(D) = Y tr{D). The definition of kurtosis tensor Wf :] ,i 2 ,e 3 ,c 4 

i— 1 

can be found in fl4l . We follow [8] and l24l and list three constraints in DKI: 

# 1. The physical relevance and biological plausibility require that D is positive definite. 

# 2. K app > 0 is the lower bound constraint on the apparent diffusion kurtosis, although in theory K app > —2. This 

lower bound is in agreement with higher order (> 4) tensors in HARDI, depicting the complex structural infor¬ 
mation of fibers in the brain. It further implies that the fourth symmetric kurtosis tensor W should be positive 
definite in three dimension (3d). 

#3. The upper bound constraint is K app < 3 /(bD app ). This limit is derived from the assumption that the signal 
intensity Sib) is a monotonically decreasing function of the ^-amplitudes. In other words, DKI can utilize 
^-values only less than 3000 s/mm 2 , which however is much more feasible in clinic imaging protocols. 

This paper has fourfold contributions: 1) We use Von Mises data augmentation to transform the non-linear Rician like¬ 
lihood into the joint likelihood in the general linear framework. This strategy provides a possibility to use the original 
Rician noise model in MRI with dramatically reduced the computational burden. 2) We propose a fast computational 
scheme for MLE in DKI by the EM algorithm. 3) The three constraints of the kurtosis tensor are explicitly adopted 
into the modeling, where we apply the ternary quartic (TQ) theory to guarantee the positivity of the kurtosis tensor 
with new parametrization. 4) We apply the barrier method combining with the Fisher scoring algorithm in DKI to 
complete the constraint #3. 

2 Theory 

2.1 MR noise and Rician magnitude 

We first recall the noise e in the raw MR-acquisitions which is composed of two i.i.d. Gaussian random variables, e r 
and £,-, with zero mean, and variance a 2 specified from the real and imaginary components, respectively. The joint 

/ £ 2 +£ 2 \ 

density of the MR noise is expressed by p s a i{e r , £;) = exp I — ~r~ 1. The magnitude Y of the MR signal, as a 
consequence, is Rician distributed with the likelihood function 

Ps,a 2 (y) = ^2 ex p(-'o(f)l(S> 0 ), (2) 

where S denotes the signal intensity corrupted by the complex valued noise having the magnitude Y = S + £ = 
(,S + £,) 2 + £ 2 , 1a(;) is the a-order modified Bessel function of the first kind, and X (■) is the indicator function. 

2.2 Von Mises augmentation 

Let (p be the phase data defined as (p := arg ^5 + £ r + i£,^ £ [0, 27z) such that S + E r = Y cos(<p) and £, = Y sin(tp). By 
the Jacobian transformation, the joint density of (p and Y with the parameters S and a 2 is 

Ps,o*(y> <p) = ^2 ex P (“2^2 (y cos M - S ) 2 - ^J 2 sin(<p) 2 ) 

= 2^2 eXP ( _ 2o^^ 2+52_25;VCOS ^^) = p s,<y 2 ^Ps^((P\y): (3) 

where the conditional density 

Ps ’ M) = 2nIo(Sy/<J 2 ) exp (^T cos( ^ ) )’ <P e [ 0 ’ 27r )- (4) 

is an instance of the Von Mises distribution on the unit circle symmetric around zero. Note that although in theory 
the zero magnitude is obtained with zero probability density, in practice, we can still acquire zero measurements after 
discretization by the scanner. In such a case the MR noise contains only the real Gaussian component and the data has 

a Gaussian likelihood p S a 2 (e r = —S , £,■ = 0) = exp 
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2.3 DTI and DKI 


Under the typical assumption of Gaussian approximation of the diffusion displacement density of water molecules, 
the DTI signal model can be expressed in the form S(b) = Soexp(-bDapp) with parametrization —bD^pp = Z 6 by 

S = S 0 exp(Z(b,g)G), (5) 


where D\p p := }_ )_ ■ ■ ■ }_ D/ t f 2 e„ge l ge 2 " '£4 with even number n G N. The tensor parameter is denoted by 0 

4=14=1 4 =i ’ 

and Z is a design matrix. For a rank-2 DTI model, the six distinct elements of D are defined as the vector parameter 
= (01,..., 0g) T := (Dn,D 22 ,D 33 ,Di 2 ,Di 3 ,D 23 ) . The corresponding design matrix, composed of m acquisitions 
is given by 
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With the new parametrization, the DKI model Eq.(|T]) can be further presented by 

S(b) =S 0 exp(-b £ gh8( 2 Dei,e 2 + ^r{ L “T^ 1 ) 2 E £4 £4 £4 £4 w 4,4,4 4 
V 4,4=1 0 4=1 J 4,4,44=1 

= Sq&xp(ZdQd +ZwOw(tr(D) -,W)), 
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where the design matrix can be e.g. {Z w e K" ,x15 : Z w . = ^{g\j,g 2 j,g?,jM\jg 2 j, 
6g]jglj; bg\jg\j , 1 2g 2 ]jg 2jg3j ; 12g 1 jgljg^j A2gi jgljgy , 4g]jg2j4g]jg3j , 4gyg 1 j , 
4 82j83jAgljgijAgljg2j)J = 1 • • -m}. 


2.4 Constrained DKI and its reparametrization 


Since D is a 3 x 3 symmetric positive definite matrix, it can be always written in terms of a product of two triangular 
matrices, D = UU T by Cholesky decomposition. Without changing the design matrix Zq, we can write the tensor 
parameter Qp, as a function of L: Gp>(L) = (L 2 ,L% + L 2 { 1 L 2 + L 2 +L 2 6 ,L\L^ 1 L\L$ 1 L^L^ and U is a 3 x 3 lower 

f Ll ) 

triangular matrix U = \ L 4 Lq constructed from the elements of L. The Jacobian V pGp> is 

\L 5 U L 3 J 
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The constraints #1 and #2 in DKI (see page 2) require that W app should be non-negative. In DTI this positive constraint 
is typically solved by Hilbert’s Theorem liTOl . proving that any real valued positive function can be written as a sum 
of three squares of quadratic forms. For a rank-4 tensor, the widely used methods are based on the strategy of Ternary 
quartic (TQ). It turns out that the non-negative TQ’s of a non-negative 3d kurtosis tensor have an expression 


Wapp = 



= v t QQ t \ = x t G\, 


(9) 
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where v = [g\,gl,gj,gig 2 ,gig 3 ,g 2 g 3 ] T , and Q = [q x \q 2 \q 2 \ is a 6 x 3 matrix, containing three 6 x 1 vectors q t . The 
Gram matrix G = QQ r is a 6 x 6 positive symmetric matrix composed of all fifteen kurtosis tensor elements plus six 




' vv 


free parameters (see |4) for details). Let Qq := tr(D) \ qi , and Pj = -g- 
the signal acquisition j. Then Eq. ([7j can be then written by 

m / 

S =S 0 £ exp ( Z Dj Q D (L) + 0 /jPjOq ). 


vv 


vv 1 


is an 18 x 18 matrix at 
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3 Maximum likelihood and weighted least squares methods with constraints 

3.1 Constrained MLE by EM algorithm 

In the optimization of the likelihood, we employ the EM (Expectation - Maximization) algorithm for maximum like¬ 
lihood estimation with constraints (CMLE) in DKI. The theory of the EM algorithm can be found in textbooks, e.g. 
03. It typically proceeds in two steps and shortens the computational complexity by using augmented data: in the 
E-step we calculate the expectation of the log likelihood w.r.t the conditional distribution of the latent variable given 
the observations, the other parameters having fixed values; in the M-step, we find the ML parameter of Sjj and a 2 by 
maximizing the augmented joint log likelihood quantities. 

For concreteness, in our data augmentation we are able to work with the joint logarithmic likelihood derived from 
Eq. ([3]) and Eq. ( |T()| under the Rician density of the signal data. After omitting the constant, the joint log-likelihood 
function is given by 

l m ( / 

m\og(o~ 2 ) — ^ L | Y j + S Q ex P ( 2Z £>j °d + 2 Qq Pj Qq 
— 2cos((pj)YjSo exp (Z Dj Q D + QqPj % ) ), (11) 


by the EM algorithm for MLE in DKI. For simplifying the notations, we define =exp (ZojQ^), i//j^ = exp (^{Q'qY PjQ^ 
and Tp = y / (cos(<py)) (/ '^ with < ■ > being introduced as a shorthand for expectation. 

In the EM-iteration, given the current parameter estimates {Qjp,Q q\s q , (cr 2 )^), we update the conditional expec¬ 
tation of cos 0 w.r.t. the conditional Von Mises distribution of (j) in Eq. 0by 


/i y y s?k]VV 2 ) w 

(cos <Py) ( 4- j. -y ■ (12) 

IotYjSptPYfio- 2 )^) 


This formula is fairly easy to obtain from the first moment of the Von Mises distribution. 
In the M-step we update Sq and a 2 by their modes with the recursions 


c(*+l) 

^0 


iyi(c PnvfY 


and 


(o- 2 )(* +1 ) 




1 

2 (m— 1) 


rf + ffi. 


|2 (C, W ) 2 (V^f ) ) 2 -2 s[ 






(13) 


(14) 


where m is the number of acquisitions in each voxel. 

To find the optimal (the mode) of parameters 0d and Qq we use the Laplace approximation for the joint likelihood 
w.r.t to Qd and Qq, (the marginal pdf n(y\Q d, Qq) ) , respectively, with Gaussian forms. This can be conducted by 
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applying the Fisher scoring method (also referred as Gauss Newton method) to minimize the objective function, the 
minus Eq. {IT}, given by 


1 m r / 

/(©) : =pL (Soexp (2Z Dj e D (L)+2egPj9Q 


-2cos((pj)YjS 0 exp (Z Dj 9 D (L) + OqPjOq 


(15) 


The essential difference between the Fisher scoring method and the traditional Newton’s method is that we use the 
Fisher or the empirical Fisher information instead of the Hessian matrix. To imposed the constraints, we apply barrier 
method (see e.g. S221). These can be achieved by using the MATLAB optimization toolbox, function fmincon with 
interior point algorithm, where the following statements are required in order to apply this function. 

For updating 9, we first update L, formulating the optimization problem: 


m 

minimize f(L) — /i Y ln(vy) 

7=1 


subject to gj{L) — V j = 0 

with gj(L) = (el) 


j = l,-" ,m. 


Vj > 0 


' 6 

3 

1 



7b z °\ 


0 d (L) 


(A-+1) 


gj(L) < 0, 


(16) 


where the function —/iln(v ; ) built a "barrier" close to the boundary of the cone M'” preventing v ; being close to the 
boundary. The positive scalar /i is called barrier parameter which should be decreasing at each iteration, and A is the 
Lagrangian multiplier. 

The Fisher information, being the expectation of observed information matrix, is given by 


{-/(L«)):=E[//(L,A)]=E[v 2 /(L)+^;V 2 ^W] 

7=1 

d 2 9 d (L) 


J[(v 2 f(d D ))JL + vf(0 D ) 


dL^dL], 


Y ^ i M Dj 


7=1 


= (O w iyl ^(^V(cf ) ) 2 (^ ) ) 2 Zfl 7 ZD;- 4 " ) # ) cf g>f ] z T Dj z Dj y L 

m 

+ Y ^i Md j- 

7=1 


with 


M Dj :=V 2 gj(L) 
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223,7 


The gradient ( \/f(L ) G R. d ) of f(L) at the current recursion is 

V/(T W) = (ct“ 2 ) W £|(4 ft) ) 2 (Cf ) ) 2 (V0 W ) 2 ^ Z D J -sfxfzfWpJiZly, 

and V g(L^)=^Z Dj J^. 


Note that this method works with single and multiple shells with different b values, meaning that b can be a scalar 
or a vector. In addition, before calculating the Fisher information, we use the regularization technique to smooth 
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the Hessian matrix (i.f.f. it is singular) by adding a scalar: H(L^ k \k^) —»■ H(L^ k \X^>) + ||§(0®),A||I, where I 
is an identical matrix with dimension d x of,with Go £ We define the score (8(L^,e A) := y f(L^ + 

L \/kjg(L^), and || • 11 is the norm operator. The norm 11 § (0 W), A11 isone optimal choice of the Levengerg-Marquart 

7=1 

parameter j26l before calculating the Fisher information. It makes the algorithm much more stable by avoiding 
the singularity of the Fisher or empirical Fisher information. Moreover, the barrier parameter is implicitly inside 
the MATLAB solver when calling fmincon, the interior point method, monitoring the decreasing situation is one 
stopping criteria of this method. Finally, we map back to Go by Gp 1 11 ■£- t/^ +1 ) (U T ) < ' k+1 ' > . Detailed interpretation of 
the calculation can be found in Appendix [A] 

Using the same idea, we update Gq by solving the following Lagrangian of problem: 


m 

minimize / () — /i ^ In () 

7=1 


subject to gj (Gq) — Vj = 0, j = 1, ■ • • , m. Vj > 0 


with 


gj(e Q ) = (Q^ k+ v 



b q 


3 

b 



&d(L), 


8j(Gq) < 0. 


(17) 


For simplification, we use the same notations (A, fi , and v) of what are used for as the general parameters when talking 
about the barrier method in the work, which of course will change case by case. Also we should emphasize that the 
constrained functions g(-) are derived from K app < 3 /(bD app ). Particularly, in this case we use the empirical Fisher 
information (also referred as the observed information matrix), being equal to the minus of Hessian matrix, H{Gq 1 A), 
given by 


/{Q ( q) = - v 2 /(%) - E hjV 2 8j(0 Q ) 


7=1 


= E18(4" ) ) 2 (Cf ) 2 (rf ) ) 2 0e^7 5 ,0 e +2(^ ) ) 2 (Cf ) 2 (wf ] ) 2 Pj 

7=1 


- xf cf Vj k) e^PjPjd Q - 2Sy k) x f 0 k> y) k, Pj - E 2A 7 

J 7=1 

and the gradient of /(0g) is 


jW-Wt- (*)„,(*) i 




V/(%) = (<y~ 2 ) k E 1 2(S ik) ) 2 (^ k) )Hvf ) ) 2 PjG Q -2S {k) xf ) ^ k) Yf ) P j G Q 

7=1 ^ 
and 

V g(G Q )=2(G£)W\^P J 


Again we use the regularization technique to smooth H(Gq, A) before calculating f (G ^ 1 ). Finally, we extract Gw 

- ^-2 

from the Gram matrix Q by G = Q T Q/tr(D) . The treatment of the singularity of Fisher or empirical Fisher in¬ 
formation (( — ^{Gq), respectively) and the detailed barrier method combing with the Fisher scoring are 
discussed in Appendix [B] 

To implement the proposed MLE scheme, we need to find the optimal © = (0£>(L), Gq) by the constrained Fisher 
scoring (CFS) methods at each iteration when updating So and <J 2 by using Eq. ( fl3| and Eq. respectively. 


This can be done by vectorizing the score §(©, A) 


fS(L, A)\ 
We,A)j 


and presenting the Fisher information ^f (0, A) by a 


sparse matrix 






Considering objective data from the brain which usually contain millions 


of voxels, this means that the proposed scheme may yield heavy computation. In practise, it is possible to update So 
and a 1 by Eq. ( p~3] l and Eq. ( | 14| till the optimal had been found, then update 0. The algorithm will stop until the 
tolerance reached by monitoring the value of logarithmic likelihood calculated from Eq. ©■ 
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3.2 Constrain weighted least square- CWLS 

The weighted least squares (WLS) is a commonly used method in diffusion MRI, see e.g. 1271121 and ®. Here we 
just impose the constraints and emphasize the problem solving under the proposed scheme. 

The objective function is constructed from Eq. (jTOji to minimize the gap between the observations and signal 
intensities, which is given by 

1 / \ 2 
/(©(L,e e )) = - £ w,-(tog Yj - log So-z Dj o D {L)-e Q PjeQ j ( 18 ) 


with the same constrained functions mentioned before. The choice of weights are free, some possibilities including 
Yj, Sj or Sj /.S’q, etc. . In this work, we choose weights, wy = Yj/ Sj and use the fixed values of So from the WLS. The 
Hessian matrices w.r.t to L and 9q are, respectively. 


■7=1 


H cwls (L,X) = J[ ( £ wj (log Yj — log Sq — Z Dj Q D (L) - QqPjQq )Z T Dj Z Dj 4+ £ XjM h 


7=1 


7=1 


h cwIs (0q,X) = 4 £ Wjl log Yj - \ og So-z Dj e D (L) - e Q Pje Q e T Q pJ e Q Pj 


7=1 


-2 £wj( log Yj - log So - z Dj e D (l) - e Q Pj d Q )p j+ 2 £ a, [ ^ 4 ] . 


7=1 


Then we have a sparse Hessian matrix // cvv / 5 (0, A) 


Hcwls ij-'i ^ ) 


H cw ls ( @Q •> ^ ) 


4 Results 

The results are composed of two parts: in the first part we simulate two different datasets, conduct the estimation 
scheme under the proposed method and popular methods including the constrained weighted least squares (CWLS) 
and the traditional MLE with constraints. Einally, we reveal the performance through comparison. Part 2 is an 
experiment on real data from a healthy volunteer with depiction of some tensor-derived image contrasts from mean 
diffusivity (MD) fractional anisotropy (PA), mean kurtosis (MK), radial kurtosis K \, as well as SNR (:= So/a). 


4.1 Simulation study 

The DW-MRI measurements are simulated from two models: 1) a biexponential model of signal decay mm, and 
we can calculate the apparent diffusion and kurtosis coefficients D app and K app analytically by 


Dapp — finP^in T (1 fin )Pex- and 

K — If (1 — f \ ( Din ~ Dex ) 2 
J^app — JJmy 1 Jin) n2 5 


(19) 


where D,„ and D ex are intracelluar and extracelluar diffusion coefficients, respectively, and f n is the fast diffusion 
relative size fraction. 2) True signal model of DKI from 0 . where we randomly choose certain amount of voxels from 
a publish data resource http://academicdepartments.musc.edu/cbi/dki/dke.html and set the tensor parameters estimated 
by lt22l as the ground truth, and then corrupt the simulated signals by pre-defined Rician noise. Note that we have 
reordered the parameters in correspondence of the design matrices Z defined in Section 2.3. SNR were chosen within 
the range of [8,40], and we fixed non-attenuation diffusion to be So = 1, so that the Rician noise <7 = 1/SNR. Then 
the ground truth can be analytically calculated from Eq. 0 . 


Synthetic experiment 1 In this experiment, we simulate two datasets. In dataset 1, we simulated 6 voxels from six 
different region of interest (ROI): gray matter next to cerebration fluid (GM/CSF); gray matter next to white matter 
(GM/WM); thalamus (TH); putamen and globus pallidus (PU/GP); internal capsule white matter (ICWM); frontal 
white matter (FWM), respectively, with the reference is to Ifl5l and also shown in Table [4] The gradient scheme 
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contains 30 directions which were chosen to acquire the human dataset. Since D app and K app can be calculated 
analytically by Eq. ( [T9| from fl5) . and according to the constraint K app < 3/ (bD app ), we calculated a maximum 
b value = 3532 s/miir which fits all the ROIs. In practise, however, we choose b value < 3000 s/mm 2 to avoid the 
numerical problems which may encounter in computation. Again we took an appropriate range of b values partially 
acquired the human dataset: 62, 249, 560, 996, 1556, 2240 s/mm 2 . In dataset 2, we randomly choose 18 voxels and 
simulate MRI measurements from Eq. 0 as described above by using the same gradients and b values. The aim of 
this experiment is to compare the performance ( e.g. the accuracy and speed ) of the WLS, CWLS, MLE methods, and 
in addition, we also use the CWLS method proposed by (8) using SQP MATLAB solver, CWLS_SQP in short, where 
we do not use the user-defined Hessian matrices but the default values provided by the solver in the computation. 

Table 1: Ground truth (GT) of the diffusion-scalar statistics of six different ROI from dataset 1 


ROI 

MD [mm 1 /s x 10~ J ] 

FA 

MK 


GM/CSF 

0.9263 

0.0669 

0.3128 

0.4267 

GM/WM 

0.8595 

0.0331 

0.4748 

0.0421 

TH 

0.9371 

0.0700 

0.8204 

0.4772 

PU/GP 

0.7814 

0.0282 

0.5449 

0.2849 

FWM 

0.8351 

0.0583 

0.6990 

0.7052 

ICWM 

0.8336 

0.0193 

0.8914 

0.4735 


Table 2: Variance of the diffusion-scalar statistics of dataset 1 


ROI 

Methods 

MD [mm 1 /s x 10“ y ] 

FA [xlO^ 2 ] 

MK 

K i 

GM/CSF 

WLS 

0.8178 

0.5324 

0.1199 

0.4964 


CWLS 

0.4538 

0.2036 

0.7949 

0.0035 


CWLS-SQP 

0.0875 

0.7321 

1.4916 

0.0157 


MLE 

0.6244 

0.1916 

1.0574 

0.0830 

GM/WM 

WLS 

1.4451 

1.6296 

0.1937 

1.0722 


CWLS 

5.0117 

0.1598 

0.1598 

0.6841 


CWLS-SQP 

3.3669 

1.6200 

0.2479 

0.3409 


MLE 

1.1027 

1.2616 

0.1524 

1.4538 

TH 

WLS 

0.3780 

1.0613 

0.0255 

0.0422 


CWLS 

4.1144 

0.5919 

0.5919 

0.0006 


CWLS-SQP 

0.9223 

0.7387 

0.6046 

1.3901 


MLE 

0.0032 

0.5170 

0.3909 

1.6857 

PU/GP 

WLS 

0.0125 

8.7861 

0.1570 

0.2890 


CWLS 

1.7518 

7.1319 

7.1319 

0.0591 


CWLS-SQP 

0.0076 

4.1835 

7.6477 

0.0012 


MLE 

0.6757 

4.2028 

3.8034 

0.0010 

FWM 

WLS 

0.0400 

0.0423 

0.0075 

0.3998 


CWLS 

9.5057 

0.1406 

1.5176 

0.2678 


CWLS-SQP 

3.3476 

0.1147 

2.4464 

0.0424 


MLE 

0.3208 

0.0085 

1.5472 

0.0013 

ICWM 

WLS 

0.9640 

2.3897 

0.6123 

0.3749 


CWLS 

1.5569 

2.3897 

0.0342 

0.0099 


CWLS-SQP 

0.0955 

3.4572 

0.1036 

1.2634 


MLE 

0.7171 

2.1623 

0.0137 

0.2415 


The values of the six ROIs were taken from ED. The results are collected from dataset 1, calculating the variance 
between the estimates and the ground truth in Table[Tj and the SNR is fixed to 15. As we can see from Table[2j the 
CWLS and the MLE methods presented in this work perform better in average than the other two, epsecially in the 
ROI-internal capsule white matter. The performance of WLS method is also good, especially in the ROI-frontal white 
matter compare with the results by the MLE, due to the low noise level of the data. 


8 




















Table 3: Mean square error (MSE) of the diffusion-scalar statistics from dataset 2 


Methods 

MD [mm 1 /s x 10 7 ] 

FA 

MK 

K ± 

DT 

KT 

WLS 

0.244 

0.0134 

0.4700 

0.5430 

0.0564 

0.7555 

CWLS 

5.451 

0.0196 

0.2112 

0.2520 

0.1071 

0.9352 

CWLS-SQP 

0.392 

0.0202 

0.2344 

0.9196 

0.0653 

0.9385 

MLE 

0.080 

0.0125 

0.0101 

0.4533 

0.0341 

0.7831 


The ground truth of MD, FA, MK, K are on an average 1.7711 mm 2 /s x 10 3 , 0.1334, 0.5991, 0.6160, respectively. 
The average computational time per voxel of CWLS, CWLS-SQP and MLE are 4.3733, 1.4565 and 1.2908 seconds 
(sec.), respectively. The percentages of voxels violating constraint #1, #2 and # 3 are 0, 14.81% and 0, respectively. 
Again in dataset 2, we fix SNR to be 15. As can be seen the proposed MLE method performed slightly better than 
other methods. WLS method works also good due to low-noise level and low percentage of voxels violation of the 
constraints. 

During the simulation, we fix SNR to be 15 for both datasets. From dataset 1, we can have a general understanding 
of information diffusion from the six different tissues by computing MD, FA, MK, and K \, etc. . The results present 
in Table [T] and set as ground truth of dataset 1. Then we estimate the tensor parameters and compare the performance 
by different methods from those tissues, respectively, by means of the variance and list the results in Table [2] From 
the table, we can see that the proposed CWLS and MLE methods perform better on an average than the other two. 
The log-normal model works well due to the low-noise level of this dataset. Moreover, the anisotropic level of this 
dataset is very weak due to the selected tissues from the reference. In dataset 2, we compute the mean square errors 
(MSE) of the diffusion statistics, including the MSE of the diffusion coefficients (DT) and the diffusion kurtosis (KT) 
as well. We also monitor the computational time per voxel on average and the percentages of voxels violation of the 
constraints. The results in Table[3]show us that the MLE method performs slightly better than others. 

Synthetic experiment 2 In this experiment, we generate one synthetic dataset. In dataset 3 again we use the public 
data resource as in the experiment 1, randomly select 180 voxels. We choose three shells with b values = 500, 1000 
,1500 s/mm 2 , and use 18 distinct gradients computed by electrostatic energy minimization algorithm which were 
shown to have the advantage of maintaining the optimal coverage of the complete scan in (7). The SNR is in the 
range [8,40] with noise (So/SNR) increasingly changing every 20 voxels to corrupt the generated signals. Again in 
this dataset we fix Sq = 1 . 

We compare the results from WLS, CWLS with user-defined Hessian matrices (CWLS), CWLS_SQP by us¬ 
ing SQP MATLAB solver with default Hessian values, as well as the least squares non-linear regression method 
(CWLS_LLS) by calling MATLAB function lsqnonlin and using the initials from WLS. We also discuss the 
choices of good initial values in Appendix [c] In order to compare the methods, we fix the estimates So and a 2 from 
WLS for all the method CWLSJLLS, CWLS and MLE. And finally we run the who scheme of MLE method (including 
update Sq and a 2 ) and record the computational time. 
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Figure 1: Mean square error (MSE) of mean diffusivity (MD, Fig. la), diffusion tensors (DT, Fig. lb), mean kurtosis 
(MK, Fig. lc) and kurtosis tensors (KT, Fig. Id) from dataset 3. In Fig. la and lb we compare all the methods, in 
Fig. lc, d we only list the results from the CWFS, CWFS_SQP and MFE as in the high noise range the results from 
the other two methods have reached out of the compared scales. 

Fig. |T|shows the performance of different methods from dataset 3 by MSE of the diffusion-scalar metrics, MD in 
Fig.la, DT in Fig.lb, MK in Fig. lc and KT in Fig. Id, respectively. In Fig. la and lb, we compare five different 
methods: WFS (red-break line), CWFS_FFS (blue-star line), CWFS (cyan line), CWFS_SQP (black-cross line) and 
MFE (magenta-circle line), the color can be seen on-line. In Fig. lc and Id, we only compare the performance by 
CWFS (cyan-star line), CWFS_SQP (black-cross line) and MFE (magenta-circle line), as in the high-noise level the 
estimates by the first two methods are far away from the comparing visible region (scale). All the figures clearly 
indicates that our MFE method has best performance among the listed metrics. For the very high-noise data, the 
proposed CWFS method shows larger MSE of MK and KT than using CWFS-SQP method with out given Hessian 
matrices, this situation may resulted from the contribution of the Hessian matrices calculated from the log-normal 
signal model. Furthermore, the percentages of voxels violating constraint #1, #2 and # 3 are 7.780% (with 14 out 
of 180 voxels are not satisfied positive constraint of rank-2 tensor), 55% and 0, respectively. After comparison, we 
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run the whole scheme of the EM-MLE method, and monitor the running time, where the algorithm in average needs 
5.1667 iterations to get convergence with 84.44% voxels using 5 iterations. 


Table 4: Parameters for biexponential diffusion model from normal human brains 


ROI 

D in [mm z 

■/s 

x 10 ~ J ] 

D ex [mm- 

'/s 

X 

o 

1 

UJ 

fin 

GM/CSF 

1.479 

± 

0.166 

0.466 

± 

0.017 

0.490 ± 0.012 

GM/WM 

1.142 

± 

0.106 

0.338 

± 

0.027 

0.622 ± 0.038 

TH 

1.320 

± 

0.164 

0.271 

± 

0.040 

0.617 ±0.069 

PU/GP 

1.609 

± 

0.039 

0.257 

± 

0.026 

0.648 ±0.028 

FWM 

1.155 

± 

0.046 

0.125 

± 

0.026 

0.648 ± 0.050 

ICWM 

1.215 

± 

0.024 

0.183 

± 

0.009 

0.637 ± 0.020 


The values of first six ROIs were taken from da. 


Table 5: Optimized 18 gradient directions 


0.737068 

-0.568030 

0.366160 

0.795763 

0.431108 

0.425331 

-0.822530 

0.367692 

0.433874 

0.000650 

0.985575 

0.169239 

0.228998 

0.150756 

0.961682 

-0.412439 

-0.753502 

0.511984 

-0.358616 

0.232844 

0.903979 

-0.891249 

-0.417614 

0.176844 

0.319924 

-0.498679 

0.805586 

0.309857 

0.667672 

0.676907 

0.579701 

-0.807043 

-0.112374 

-0.209598 

-0.358489 

0.909700 

0.990653 

-0.112342 

0.077367 

0.153276 

-0.903274 

0.400754 

0.530172 

0.845386 

0.065124 

-0.282930 

0.716688 

0.637423 

0.720077 

-0.052737 

0.691887 

-0.733882 

-0.178601 

0.655377 


This set of gradients were taken from (7), point set 1, which were computed by electrostatic energy minimization 
algorithm and shown good performance. 


Summary All the synthetic experiments were carried out on a 64-bit 4-core computer with 16 Gb RAM, and the 
CPU of each core is 3.40GHz with MATLAB. 

Table 6 : Comparison of the estimation time 


Dataset 4 

RT, EM-MLE 

Total iterations 

RT, CWLS 

RT, CWLS SQP 

RT, MLE* 

mean 

0.2425 

11 

1.4353 

0.7123 

0.6903 

max 

22.1953 

39 

6.7446 

2.4143 

3.6356 

min 

0.0205 

2 

0.2033 

0.0893 

0.0680 


The running time (RT) in average, minimum and maximum per voxel with unit second. The listed results are based on 
180 voxels from six different ROIs. The last column records the MLE method updating 6 only. The MLE* method 
seems to be as efficient as the CWLS_SQP method. With the EM-MLE scheme, in some voxels we need many 
iterations and some others only need a few to get convergence. Additionally, the EM-MEL scheme may have even 
shorter running time on average from some small datasets than that by the MLE, because in each iteration So and 
a 2 are also updated simultaneously to obtain the optimal values, which therefore may shorten the running time when 
updating d. 
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In Fig. [2] we show the estimated SNR of 180 voxels by both the WLS and the EM-MLE methods. As can be 
seen that the estimates (red cross) by the WLS have large bias in the low SNR region, and may appear some ouliers, 
e.g. the one marked by rectangle. In addition, they are overestimated and underestimated in the whole region of 
SNR. While the estimates (green circle) by the EN-MLE sheme performs quite well in the low SNR region. Then 
they fluctuate basically over the ground truth with an slight increase of deviation when the SNR is increasing. This 
is probably because we set quite loose tolerances for those parameters in order to shorten the iteration of the whole 
running scheme, which therefore results the converged estimates have not reach the optimals. 



Figure 2: SNR of 180 voxels estimated by the WLS and the EM-MLE methods. The blue line presents the ground 
truth, the red cross and the green circle show the estimates by the WLS and the EM-MLE methods, respectively. 


4.2 Real data 

This data are part of a real experiment. It is consist of 2204 diffusion MR-images of the brain from an healthy 
human volunteer, taken from four 5wra-thick consecutive axial slices, and measured by a Philips Achieva 3.0 Tesla 
MR-scanner. The image resolution is 128 x 128 pixels of size 1.875 x 1.875 mm 1 . After masking out the skull and 
the ventricles, we remain with a region of interest (ROI) containing 18764 voxels. In the protocol, we used all the 
combinations of the 32 gradient directions with the b -values varying in the range 0, 62, 249, 560, 996, 1556, 2240 
s/mm 2 , with 3 repetitions, for a total of 7 242 904 data points. 

Results In this session, we depict the results by MD Fig. 3, FA Fig. 4 as well as MK Fig. 5 from the proposed 
CWLS and MLE methods.The diffusion weighted MR data is in the range of (0, 581), acquired by 32 distinct gradient 
directions with seven different b values. After comparison, we can see that the image constrasts by the MLE method 
gain much more detailed stmctural information, especially in Fig.4 and Fig. 5 than those by the CWLS in the same 
scales. 


5 Discussion 

In this work, we propose an estimation scheme by the EM algorithm for MLE in contained DKI. We use the Rician 
noise model of signal measurements through data augmentation to conduct the DKI estimation, which plays crucial 
roles at low SNRs and leads less biased estimates both in theory and what has been observed in the experiments. Using 
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Figure 3: 3d maps of MD by the CWLS and the MLE methods from four consecutive slices of human brain.The MD 
maps were scaled between (0,6) x 10 ~ 3 mm 2 /s. 
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Figure 4: 3d maps of FA by the CWLS and the MLE methods from four consecutive slices of human brain.The FA 
maps is between (0,1). 
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Figure 5: 3d maps of MK by the CWLS and the MLE methods from four consecutive slices of human brain. The MK 
maps were scaled in the range of (0,4). 
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the state-of-the-art statistical methodology of data augmentation, we are able to work with a generalized linear model 
(GLM) of the joint likelihood derived from the Rician density. The positive constraints are imposed by Cholesky 
decomposition and the new parametrization of TQ for the 2nd order and kurtosis tensor, respectively. The whole 
scheme is not only for updating simultaneously the tensor parameters but updating the noise and the unattenuated 
signal. To apply this whole scheme in other simpler model such as CWLS or other DWI alternatives is straightforward. 

Using the Fisher scoring algorithm to solve optimization problem of the specific non-linear quartic regression 
problem from DKI, we can dramatically reduce the computational cost by deriving the gradient functions, the Hessian 
matrices and reducing the complexity of the Fisher information. Especially, Op is a function of L which provide possi¬ 
bility to calculate the essential Fisher information for updating the parameter L in the Fisher scoring method. Compare 
with the the observed information (or so-called empirical Fisher information) ^ {L), the Fisher information’s alge¬ 
braically simper formula, will lead substantially less computation, and it is much stable in the sense of being singular 
than the observed information matrix. Further details can be found in a. On the other hand, the barrier method 
provides the possibility to impose the non-linear constraints in implementation. The two methods combined together 
create a constrained Fisher scoring scheme for updating the tensor parameters in DKI. Furthermore, as reported in 
literature that implementation of the interior point method (with the common Newton and the barrier methods) can 
be very difficult. In this work we prefer the use of the Fisher scoring method instead of the Newton algorithm and 
applying the regularization technique to smooth the Hessian matrix by H(8 ^ k \+ ||S(0®)||) before calculating 
the Fisher information. As a consequence, the results show us that our constrained Fisher scoring scheme works very 
efficiently. 
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Appendix 

A Fisher scoring method for L 


Let’s Cj A) =exp(Z D; '0^)>V'}' C ' 1 =exp^(0. 

The score is of do is the first derivative of Eq. CD w.r.t. Oo given by 

m ( 

Vq (G D ) = (ct- 2 )« £ { ( 5«) 2 (Cf fiyffzj) 

j =i 

and the Hessian matrix is 

V 2 q(9 D ) = (cr- 2 )W £ { 2(S<f ) ) 2 (Cf fivffzlZo-S^xf^y^ZlZo 

j= i ^ 


M\ „,(*) 


zf =Yj(cos( ( pj)) ik \ 


and observed information 
The score of L expresses 


>) = — \7 2 <?(0d) is defined as the minus Hessian. 


V q(L) = (a“ 2 )» £ | {S [k) )\^f ) )\x i ,f) 2 zlj L -S (k \f ) ^ k] x l ffz T D J L 
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and the corresponding Hessian matrix is 


V 2 ?(£) = Jl (V 2 <i{0))Jl + Vq{0 D ) 


d 2 e D {L) 


dL k dL h 
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The Fisher information is given by 

(jf{L)W) :=E[-v 2 log^(y;0£,(L))] = 

- (ct- 2 )« £ | J[ (lisff^fiwffz^jZoj - S {k) z (k) Cf vfzljZo^) J L 

with the expectation at do, the current value of do, 

E[v<7 (0 d)] =0 and 

m ( n 

E[v 2 9(e D )MO (i| E 

Note that do is a function of L which provide possibility to calculate the essential Fisher information equalling to the 
expectation value of (or minus) Hessian matrix for updating L in the Fisher scoring method. Compared with the the 
observed information (or so-called empirical Fisher information) J? (V), the Fisher information’s algebraically simper 
formula, will lead substantially less computation, and it is much more stable in the sense of being singular than the 
observed information matrix. Details can be found in |9). 
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B Constrained Fisher scoring method 


Using the barrier method we form two Lagrangian of problems presented in Eq. >[T6|) and Eq. 17 Firstly, we need 


compute the score §(•) and set its components to be zero to find the necessary conditions of the optimal: 


§0 = v/(0)+^(0) T A=o, 

S v =< A, v >= /r, 

§1 =<?( 0 ) + v = 0 , 


where < • > is an operator of inner product, and A(0) := \jg(9) with dimension cl x m. In particular, we see why the 
barrier function is used in the logarithmic form. Then we need compute the Hessian matrix 

/H(9, A) 0 A(0) T \ 

H(6,V, A) = I 0 diag( A) diag(v) ] , 

\ A(0) Imxl 0 ) 

where diag(-) is diagonalizing operator to construct the vector to be m x m matrix. Applying the Fisher scoring 
method, we update 

A(* +J )^A«+J3^^ 's A , 

with - 11(9 .k). or some regularized form, e.g. mentioned in this work. 

§0 = A( 0 ) t (A (A:+ 1 ) - A (i_) ) - v/(0) +A( 0 ) T A 

and J r x=diag(v)/diag{ A), = A(0)(0^ +1 ^ - 0 W ) + g(0) +/i/A, 


where a is a positive primal parameter, and /j is a positive dual step parameter. To improve the convergence of the 
algorithm, the step parameters can be iteratively reduced by monitoring the logarithmic likelihood lfl3l . This is the 
so-called the Levengerg-Marquart algorithm. Beside the barrier parameter fi should be decrease as well during the 
iteration. All the above can be achieved by calling MATLAB optimization toolbox, function fmincon with the 
interior point method (IP). However in practice, for IP method can be very difficult to implement, if the selection of 
regularization technique, the step parameters, and the barrier parameter are not mutually consistent. In the sense, this 
algorithm requires skilful designs from users, including calculation of Hessian matrices, choices of regularization, 
choices of stopping criteria of step parameters regarding to a specific problem in order to make the algorithm works 
efficiently. 


C Choices of good initial values 


In this section, we discuss a possible solution to obtain good initial values of the tensor parameters fulfilling the 
positive constraints for saving the computational cost. 

Firstly, we can use the DTI approach to estimate the 2nd-order diffusion tensor, and then apply Cholesky decom¬ 
position to get the initials of L. When encountering non-positive definite diffusion tensor matrices ( D, 3 x 3), we can 
set the corresponding non-positive eigenvalues to be negligible and positive. In such a way, we gain positive definite 
D and preserve the directions of positive curvature in the original tensor matrices. 

In order to get good initial values for Q , we can call the Kurtosis model presented in Eq. 0 and calculate the 
kurtosis tensor 9w, and then construct the 6 x 6 Gram matrix ( G ) by the fifteen distinct elements in a 4th-order tensor 
matrix, denoted by W presented as a matrix from the tensor parameter %■. Since G are symmetric, we can define 

'?Wui2 ^Wni 3 d \ 

e W 2223 I , Q can possibly be obtained by solving the system of 

5 W 1333 2^2333/ 


W -+G = 


M 

N 


with N = 



equations 


TA =AT=N , 
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, with two 3x3 matrices, in 


where we apply the QR decomposition w.r.t matrix N to reformulate Q as Q = 

particular, T are lower triangular matrices, and some choices of parameters d, e. f in the Gram matrix can be found 
in 0 in order to make the rank of G equal to 3. Note that in such reformation, each Q contains the same number 
of distinct entries as W, i.e. fifteen instead of eighteen. The detailed interpretation can be found in 0. The above 
scheme can be conducted by the least squares(LS) and the weighted least squares WLS) methods without constraints, 
simultaneously, we get Sq and the noise parameter a 2 at each voxel. 
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